array([[ 0.03807591, 0.05068012, 0.06169621, ..., -0.00259226,
0.01990749, -0.01764613],
[-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
-0.06833155, -0.09220405],
[ 0.08529891, 0.05068012, 0.04445121, ..., -0.00259226,
0.00286131, -0.02593034],
...,
[ 0.04170844, 0.05068012, -0.01590626, ..., -0.01107952,
-0.04688253, 0.01549073],
[-0.04547248, -0.04464164, 0.03906215, ..., 0.02655962,
0.04452873, -0.02593034],
[-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
-0.00422151, 0.00306441]])
(442, 10)
(442,)
Convert Diabetes X and show into table form using pandas
| age | sex | bmi | average_bp | S1_cholestrol | low_dl | high_dl | HD | ltg | glu | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.038076 | 0.050680 | 0.061696 | 0.021872 | -0.044223 | -0.034821 | -0.043401 | -0.002592 | 0.019907 | -0.017646 |
| 1 | -0.001882 | -0.044642 | -0.051474 | -0.026328 | -0.008449 | -0.019163 | 0.074412 | -0.039493 | -0.068332 | -0.092204 |
| 2 | 0.085299 | 0.050680 | 0.044451 | -0.005670 | -0.045599 | -0.034194 | -0.032356 | -0.002592 | 0.002861 | -0.025930 |
| 3 | -0.089063 | -0.044642 | -0.011595 | -0.036656 | 0.012191 | 0.024991 | -0.036038 | 0.034309 | 0.022688 | -0.009362 |
| 4 | 0.005383 | -0.044642 | -0.036385 | 0.021872 | 0.003935 | 0.015596 | 0.008142 | -0.002592 | -0.031988 | -0.046641 |
Diabetes data head is shown in above figure.
Now change diabetes y into pandas table form and add with x table.
| progression | |
|---|---|
| 0 | 151.0 |
| 1 | 75.0 |
| 2 | 141.0 |
| 3 | 206.0 |
| 4 | 135.0 |
| age | sex | bmi | average_bp | S1_cholestrol | low_dl | high_dl | HD | ltg | glu | progression | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.038076 | 0.050680 | 0.061696 | 0.021872 | -0.044223 | -0.034821 | -0.043401 | -0.002592 | 0.019907 | -0.017646 | 151.0 |
| 1 | -0.001882 | -0.044642 | -0.051474 | -0.026328 | -0.008449 | -0.019163 | 0.074412 | -0.039493 | -0.068332 | -0.092204 | 75.0 |
| 2 | 0.085299 | 0.050680 | 0.044451 | -0.005670 | -0.045599 | -0.034194 | -0.032356 | -0.002592 | 0.002861 | -0.025930 | 141.0 |
| 3 | -0.089063 | -0.044642 | -0.011595 | -0.036656 | 0.012191 | 0.024991 | -0.036038 | 0.034309 | 0.022688 | -0.009362 | 206.0 |
| 4 | 0.005383 | -0.044642 | -0.036385 | 0.021872 | 0.003935 | 0.015596 | 0.008142 | -0.002592 | -0.031988 | -0.046641 | 135.0 |
Visualization using Matplotlib
From the above graph we can see that progression of diabetes is higher for people who have high bmi and also people are who are old.
Univariate Linear Regression Model Building
Linear regression model for Body mass index feature and progression
Coefficients and intercepts
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Coefficients: [981.65543614] Intercept: 152.28824927379569
Slope is 981.65 which means model has positive correlation and the value where it intercepts is 152.2882
| Actual | Predicted | |
|---|---|---|
| 362 | 321.0 | 255.174269 |
| 249 | 215.0 | 211.794626 |
| 271 | 127.0 | 161.008702 |
| 435 | 64.0 | 129.267499 |
| 400 | 175.0 | 196.982065 |
Presenting the solution
Mean Absolute Error: 52.94370285288119 Mean Squared Error: 4150.6801893299835 Root Mean Squared Error: 64.42577271038341 R2 Score: 0.19057346847560142
In this model as we can see that mean absoulte error is high it means it is not giving accurate predicitions, moreover we have only selected one dependant variable. furthermore root mean squared error also shows that quality of predicitons are not good and they are too far from the true values measured.